Mathematics Prerequisites

Published

Last modified: 2026-01-08: 5:29:42 (UTC)

1 Mathematics


Math is not just a way of calculating numerical answers; it is a way of thinking, using clear definitions for concepts and rigorous logic to organize our thoughts and back up our assertions.

Cheng (2025)


These lecture notes use:

  • algebra
  • precalculus
  • univariate calculus
  • linear algebra
  • vector calculus

Some key results are listed here.

1.1 Elementary Algebra

Mastery of Elementary Algebra (a.k.a. “College Algebra”) is a prerequisite for calculus, which is a prerequisite for Epi 202 and Epi 203, which are prerequisites for this course (Epi 204). Nevertheless, each year, some Epi 204 students are still uncomfortable with algebraic manipulations of mathematical formulas. Therefore, I include this section as a quick reference.

1.1.1 Equalities

Theorem 1 (Equalities are transitive) If \(a=b\) and \(b=c\), then \(a=c\)


Theorem 2 (Substituting equivalent expressions) If \(a = b\), then for any function \(f(x)\), \(f(a) = f(b)\)


1.1.2 Inequalities

Theorem 3 If \(a<b\), then \(a+c < b+c\)


Theorem 4 (negating both sides of an inequality) If \(a < b\), then: \(-a > -b\)


Theorem 5 If \(a < b\) and \(c \geq 0\), then \(ca < cb\).


Theorem 6 \[-a = (-1)*a\]


1.1.3 Sums

Theorem 7 (adding zero changes nothing) \[a+0=a\]


Theorem 8 (Sums are symmetric) \[a+b = b+a\]


Theorem 9 (Sums are associative)  

When summing three or more terms, the order in which you sum them does not matter:

\[(a + b) + c = a + (b + c)\]


1.1.4 Products


Theorem 10 (Multiplying by 1 changes nothing) \[a \times 1 = a\]


Theorem 11 (Products are symmetric) \[a \times b = b \times a\]


Theorem 12 (Products are associative) \[(a \times b) \times c = a \times (b \times c)\]

1.1.5 Division

Theorem 13 (Division can be written as a product) \[\frac {a}{b} = a \times \frac{1}{b}\]

1.1.6 Sums and products together


Theorem 14 (Multiplication is distributive) \[a(b+c) = ab + ac\]


1.1.7 Quotients

Definition 1 (Quotients, fractions, rates)  

A quotient, fraction, or rate is a division of one quantity by another:

\[\frac{a}{b}\]

In epidemiology, rates typically have a quantity involving time or population in the denominator.

c.f. https://en.wikipedia.org/wiki/Rate_(mathematics)

Definition 2 (Ratios) A ratio is a quotient in which the numerator and denominator are measured using the same unit scales.


Definition 3 (Proportion) In statistics, a “proportion” typically means a ratio where the numerator represents a subset of the denominator.


Definition 4 (Proportional) Two functions \(f(x)\) and \(g(x)\) are proportional if their ratio \(\frac{f(x)}{g(x)}\) does not depend on \(x\). (c.f. https://en.wikipedia.org/wiki/Proportionality_(mathematics))


Additional reference for elementary algebra: https://en.wikipedia.org/wiki/Population_proportion#Mathematical_definition


1.2 Exponentials and Logarithms

Theorem 15 (logarithm of a product is the sum of the logs of the factors) \[ \log{a\cdot b} = \log{a} + \log{b} \]

Corollary 1 (logarithm of a quotient)  

The logarithm of a quotient is equal to the difference of the logs of the factors:

\[\log{\frac{a}{b}} = \log{a} - \log{b}\]

Theorem 16 (logarithm of an exponential function) \[ \text{log}{\left\{a^b\right\}} = b \cdot\text{log}{\left\{a\right\}} \]

Theorem 17 (exponential of a sum)  

The exponential of a sum is equal to the product of the exponentials of the addends:

\[\text{exp}{\left\{a+b\right\}} = \text{exp}{\left\{a\right\}} \cdot\text{exp}{\left\{b\right\}}\]

Corollary 2 (exponential of a difference)  

The exponential of a difference is equal to the quotient of the exponentials of the addends:

\[\text{exp}{\left\{a-b\right\}} = \frac{\text{exp}{\left\{a\right\}}}{\text{exp}{\left\{b\right\}}}\]


Theorem 18 (exponential of a product) \[a^{bc} = {\left(a^b\right)}^c = {\left(a^c\right)}^b\]


Corollary 3 (natural exponential of a product) \[\text{exp}{\left\{ab\right\}} = (\text{exp}{\left\{a\right\}})^b = (\text{exp}{\left\{b\right\}})^a\]


Exercise 1 For \(a \ge 0,~b,c \in \mathbb{R}\), When does \((a^b)^c = a^{(b^c)}\)?


Solution 1. Short answer: rarely (that’s all you need to know for this course).

Long answer:

If \((a^b)^c = a^{(b^c)}\), then since \((a^b)^c = a^{bc}\), we have: \[a^{bc} = a^{(b^c)}\] \[\text{log}{\left\{a^{bc}\right\}} = \text{log}{\left\{a^{(b^c)}\right\}}\] \[bc \cdot \text{log}{\left\{a\right\}} = b^c\cdot \text{log}{\left\{a\right\}} \tag{1}\]

Equation 1 holds in each of the following cases:

  1. \(bc = b^c\) (see Exercise 2).
  2. \(a=1\) (i.e., \(\text{log}{\left\{a\right\}} = 0\)).
  3. \(a=0\) (i.e., \(\text{log}{\left\{a\right\}}= -\infty\)) and \(\text{sign}{\left\{bc\right\}}=\text{sign}{\left\{b^c\right\}}\).

In particular, when \(a=0\) and \(c=0\), \(bc = 0\) and \(b^c = 1\) (for any \(b \in \mathbb{R}\)), so \(\text{sign}{\left\{bc\right\}}\neq \text{sign}{\left\{b^c\right\}}\), and \((a^b)^c \neq a^{(b^c)}\):

\[ \begin{aligned} (a^b)^c &= (0^b)^0 \\ &= 1 \end{aligned} \]

\[ \begin{aligned} a^{(b^c)} &= 0^{(b^0)} \\ &= 0^1 \\ &= 0 \end{aligned} \]


Exercise 2 For \(b,c \in \mathbb{R}\), when does \(b^c = bc\)?


Solution 2. \(bc = b^c\) in each of the following cases:

  1. \(c = 1\).
  2. \(b=0\) and \(c > 0\).
  3. \(b = \text{exp}{\left\{\frac{\log{c}}{c-1}\right\}}\) (for \(c \ge 0\)).

See the red contours in Figure 2 for a visualization.

Show R code
`b*c_f` <- function(b, c) b*c
`b^c_f` <- function(b, c) b^c
values_b <- seq(0, 5, by = .01)
values_c <- seq(-.5, 3, by = .01)

`b*c` <- outer(values_b, values_c, `b*c_f`)
`b^c` <- outer(values_b, values_c, `b^c_f`)
`b^c`[is.infinite(`b^c`)] = NA

opacity <- .3
z_min <- min(`b*c`, `b^c`, na.rm = TRUE)
z_max <- 5
plotly::plot_ly(
  x = ~values_b,
  y = ~values_c
) |>
  plotly::add_surface(
    z = ~ t(`b*c`),
    contours = list(
      z = list(
        show = TRUE,
        start = -1,
        end = 1,
        size = .1
      )
    ),
    name = "b*c",
    showscale = FALSE,
    opacity = opacity,
    colorscale = list(c(0, 1), c("green", "green"))
  ) |>
  plotly::add_surface(
    opacity = opacity,
    colorscale = list(c(0, 1), c("red", "red")),
    z = ~ t(`b^c`),
    contours = list(
      z = list(
        show = TRUE,
        start = z_min,
        end = z_max,
        size = .2
      )
    ),
    showscale = FALSE,
    name = "b^c"
  ) |>
  plotly::layout(
    scene = list(
      xaxis = list(
        # type = "log",
        title = "b"
      ),
      yaxis = list(
        # type = "log",
        title = "c"
      ),
      zaxis = list(
        # type = "log",
        range = c(z_min, z_max),
        title = "outcome"
      ),
      camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
      aspectratio = list(x = .9, y = .8, z = 0.7)
    )
  )
Figure 1: Graph of \(b*c\) and \(b^c\)
Show R code
`b^c - b*c_f` <- function(b, c) `b^c_f`(b,c) - `b*c_f`(b,c)

mat1 <- outer(values_b, values_c, `b^c - b*c_f`)
mat1[is.infinite(mat1)] = NA

opacity <- .3
plotly::plot_ly(
  x = ~values_b,
  y = ~values_c
) |>
  plotly::add_surface(
    z = ~ t(mat1),
    contours = list(
      z = list(
        show = TRUE,
        start = 0,
        end = 1,
        size = 1,
        color = "red"
      )
    ),
    name = "b^c - b*c",
    showscale = TRUE,
    opacity = opacity
  ) |>
  plotly::layout(
    scene = list(
      xaxis = list(
        # type = "log",
        title = "b"
      ),
      yaxis = list(
        # type = "log",
        title = "c"
      ),
      zaxis = list(
        title = "outcome"
      ),
      camera = list(eye = list(x = -1.25, y = -1.25, z = 0.5)),
      aspectratio = list(x = .9, y = .8, z = 0.7)
    )
  )
Figure 2: Graph of \(b^c - b*c\). Red contour lines show where \(b^c = b*c\).

Theorem 19 (\(\text{exp}{\left\{\right\}}\) and \(\text{log}{\left\{\right\}}\) are mutual inverses) \[\text{exp}{\left\{\text{log}{\left\{a\right\}}\right\}} = \text{log}{\left\{\text{exp}{\left\{a\right\}}\right\}} = a\]

1.3 Derivatives

Theorem 20 (Constant rule) \[\frac{\partial}{\partial x}c = 0\]


Theorem 21 (Power rule) If \(a\) is constant with respect to \(x\), then: \[\frac{\partial}{\partial x}ay = a \frac{\partial x}{\partial y}\]


Theorem 22 (Power rule) \[\frac{\partial}{\partial x}x^q = qx^{q-1}\]


Theorem 23 (Derivative of natural logarithm) \[\text{log}'{\left\{x\right\}} = \frac{1}{x} = x^{-1}\]

Theorem 24 (derivative of exponential) \[\text{exp}'{\left\{x\right\}} = \text{exp}{\left\{x\right\}}\]


Theorem 25 (Product rule) \[(ab)' = ab' + ba'\]

Theorem 26 (Quotient rule) \[(a/b)' = a'/b - (a/b^2)b'\]

Theorem 27 (Chain rule) \[\begin{aligned} \frac{\partial a}{\partial c} &= \frac{\partial a}{\partial b} \frac{\partial b}{\partial c} \\ &= \frac{\partial b}{\partial c} \frac{\partial a}{\partial b} \end{aligned} \]

or in Euler/Lagrange notation:

\[(f(g(x)))' = g'(x) f'(g(x))\]


Corollary 4 (Chain rule for logarithms) \[ \frac{\partial}{\partial x}\log{f(x)} = \frac{f'(x)}{f(x)} \]

Proof. Apply Theorem 27 and Theorem 23.


1.4 Linear Algebra

Definition 5 (Dot product/linear combination/inner product) For any two real-valued vectors \(\tilde{x}= (x_1, \ldots, x_n)\) and \(\tilde{y}= (y_1, \ldots, y_n)\), the dot-product, linear combination, or inner product of \(\tilde{x}\) and \(\tilde{y}\) is:

\[\tilde{x}\cdot \tilde{y}= \tilde{x}^{\top} \tilde{y}\stackrel{\text{def}}{=}\sum_{i=1}^nx_i y_i\]

Note

See also the definitions in

“Linear combination” can also refer to weighted sums of vectors, or in other words matrix-vector multiplication.

The dot-product has a different generalization for two matrices; see wikipedia for more.


Theorem 28 (Dot product is symmetric) The dot product is symmetric:

\[\tilde{x}\cdot \tilde{y}= \tilde{y}\cdot \tilde{x}\]


Proof. Apply:

1.5 Vector Calculus

(adapted from Fieller (2016), §7.2)

Let \(\tilde{x}\) and \(\tilde{\beta}\) be vectors of length \(p\), or in other words, matrices of length \(p \times 1\):

\[ \tilde{x}= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ \]

\[ \tilde{\beta}= \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \]

Definition 6 (Transpose) The transpose of a row vector is the column vector with the same sequence of entries:

\[ \tilde{x}' \equiv \tilde{x}^\top \equiv [x_1, x_2, ..., x_p] \]

Example 1 (Dot product as matrix multiplication) \[ \begin{aligned} \tilde{x}\cdot \tilde{\beta} &= \tilde{x}^{\top} \tilde{\beta} \\ &= [x_1, x_2, ..., x_p] \begin{bmatrix} \beta_{1} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} \\ &= x_1\beta_1+x_2\beta_2 +...+x_p \beta_p \end{aligned} \]

Theorem 29 (Transpose of a sum) \[(\tilde{x}+\tilde{y})^{\top} = \tilde{x}^{\top} + \tilde{y}^{\top}\]


Definition 7 (Vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:

\[ \frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) \\ \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) \\ \vdots \\ \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]


Definition 8 (Row-vector derivative) If \(f(\tilde{\beta})\) is a function that takes a vector \(\tilde{\beta}\) as input, such as \(f(\tilde{\beta}) = x'\tilde{\beta}\), then:

\[ \frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = \begin{bmatrix} \frac{\partial}{\partial \beta_1}f(\tilde{\beta}) & \frac{\partial}{\partial \beta_2}f(\tilde{\beta}) & \cdots & \frac{\partial}{\partial \beta_p}f(\tilde{\beta}) \end{bmatrix} \]


Theorem 30 (Row and column derivatives are transposes) \[\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta})\right)}^{\top}\]

\[\frac{\partial}{\partial \tilde{\beta}} f(\tilde{\beta}) = {\left(\frac{\partial}{\partial \tilde{\beta}^{\top}} f(\tilde{\beta})\right)}^{\top}\]


Theorem 31 (Derivative of a dot product) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{x}\cdot \tilde{\beta}= \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}\cdot \tilde{x}= \tilde{x} \]

This looks a lot like non-vector calculus, except that you have to transpose the coefficient.


Proof. \[ \begin{aligned} \frac{\partial}{\partial \beta} (x^{\top}\beta) &= \begin{bmatrix} \frac{\partial}{\partial \beta_1}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \frac{\partial}{\partial \beta_2}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ \vdots \\ \frac{\partial}{\partial \beta_p}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \end{bmatrix} \\ &= \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{p} \end{bmatrix} \\ &= \tilde{x} \end{aligned} \]


Definition 9 (Quadratic form) A quadratic form is a mathematical expression with the structure

\[\tilde{x}^{\top} \mathbf{S} \tilde{x}\]

where \(\tilde{x}\) is a vector and \(\mathbf{S}\) is a matrix with compatible dimensions for vector-matrix multiplication.

Quadratic forms occur frequently in regression models. They are the matrix-vector generalizations of the scalar quadratic form \(cx^2 = xcx\).


Theorem 32 (Derivative of a quadratic form) If \(S\) is a \(p\times p\) matrix that is constant with respect to \(\beta\), then:

\[ \frac{\partial}{\partial \beta} \beta'S\beta = 2S\beta \]

This is like taking the derivative of \(cx^2\) with respect to \(x\) in non-vector calculus.


Corollary 5 (Derivative of a simple quadratic form) \[ \frac{\partial}{\partial \tilde{\beta}} \tilde{\beta}'\tilde{\beta}= 2\tilde{\beta} \]

This is like taking the derivative of \(x^2\).


Theorem 33 (Vector chain rule) \[\frac{\partial z}{\partial \tilde{x}} = \frac{\partial y}{\partial \tilde{x}} \frac{\partial z}{\partial y}\]

or in Euler/Lagrange notation:

\[(f(g(\tilde{x})))' = \tilde{g}'(\tilde{x}) f(g(\tilde{x}))\]

See https://quickfem.com/finite-element-analysis/, specifically https://quickfem.com/wp-content/uploads/IFEM.AppF_.pdf

See also https://en.wikipedia.org/wiki/Gradient#Relationship_with_Fr%C3%A9chet_derivative

This chain rule is like the univariate chain rule (Theorem 27), but the order matters now. The version presented here is for the gradient (column vector); the total derivative (row vector) would be the transpose of the gradient.


Corollary 6 (Vector chain rule for quadratic forms) \[\frac{\partial}{\partial \tilde{\beta}}{{\left(\tilde{\varepsilon}(\tilde{\beta})\cdot \tilde{\varepsilon}(\tilde{\beta})\right)}} = {\left(\frac{\partial}{\partial \tilde{\beta}}\tilde{\varepsilon}(\tilde{\beta})\right)} {\left(2 \tilde{\varepsilon}(\tilde{\beta})\right)}\]

1.6 Additional resources

1.6.1 Calculus

1.6.2 Linear Algebra and Vector Calculus

  • Fieller (2016)
  • Banerjee and Roy (2014)
  • Searle and Khuri (2017)

1.6.3 Numerical Analysis

1.6.4 Real Analysis

Back to top

References

Banerjee, Sudipto, and Anindya Roy. 2014. Linear Algebra and Matrix Analysis for Statistics. Vol. 181. Crc Press Boca Raton. https://www.routledge.com/Linear-Algebra-and-Matrix-Analysis-for-Statistics/Banerjee-Roy/p/book/9781420095388.
Banner, Adrian D. 2007. The Calculus Lifesaver : All the Tools You Need to Excel at Calculus. A Princeton Lifesaver Study Guide. Princeton, New Jersey: Princeton University Press. https://press.princeton.edu/books/paperback/9780691130880/the-calculus-lifesaver.
Cheng, Eugenia. 2025. “Opinion | How Math Turned Me from a D.E.I. Skeptic to a Supporter.” The New York Times. https://www.nytimes.com/2025/09/05/opinion/math-dei.html.
Dobson, Annette J, and Adrian G Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. CRC press. https://doi.org/10.1201/9781315182780.
Fieller, Nick. 2016. Basics of Matrix Algebra for Statistics with R. Chapman; Hall/CRC. https://doi.org/10.1201/9781315370200.
Grinberg, Raffi. 2017. The Real Analysis Lifesaver: All the Tools You Need to Understand Proofs. 1st ed. Princeton Lifesaver Study Guides. Princeton: Princeton University Press. https://press.princeton.edu/books/paperback/9780691172934/the-real-analysis-lifesaver.
Kaplan, Daniel. 2022. MOSAIC Calculus. www.mosaic-web.org. www.mosaic-web.org.
Khuri, André I. 2003. Advanced Calculus with Applications in Statistics. John Wiley & Sons. https://www.wiley.com/en-us/Advanced+Calculus+with+Applications+in+Statistics%2C+2nd+Edition-p-9780471391043.
Miller, Steven J. 2016. The Probability Lifesaver: Calculus Review Problems. https://web.williams.edu/Mathematics/sjmiller/public_html/probabilitylifesaver/index.htm#:~:text=http%3A//web.williams.edu/Mathematics/sjmiller/public_html/probabilitylifesaver/supplementalchap_calcreview.pdf.
Searle, Shayle R, and Andre I Khuri. 2017. Matrix Algebra Useful for Statistics. John Wiley & Sons.